As new and returning residents of New York, we are intrigued by the bustling rat population around us and across each borough. We are interested in exploring the impact of factors like day of the week, month, borough, latitude, longitude, and type of building has on rat sighting around New York. By understanding what influences the amount and location of rat sightings, we will be able to know which areas of the city to avoid and which areas may have the world’s best ratatouille.
Brown rats are not indigenous to New York; they are a product of the colonization of North America and were brought over on ships in the 18th century. The brown rat is the most common variety in New York today and slowly overtook the black rat population due to their highly aggressive and dominant nature. Brown rats’ ability to tunnel through and eat just about anything has lead them to be one of the top pest in New York for millennia. A testament to their cultural relevance, the Rolling Stones 1978 record, Shattered, made a reference to the rats of New York City: “We’ve got rats on the west side”. More recently, these omnivorous creatures have even spurred the creation of a city government position titled Director of Rodent Mitigation or, colloquially, “Rat Czar”.
We set out to answer the following questions:
Throughout the course of working on the project, we became interested in sightings year-over-year and included some time series plots in the analysis.
Our is publicly available from Open Data NYC (https://data.cityofnewyork.us/Social-Services/Rat-Sightings/3q43-55fe), downloaded in November, 2023. The raw data contains 232,090 records of rat sightings and variables relating to geographical location, type of location, and time of sighting. In order to begin the data cleaning and analysis process, we loaded the following libraries:
tidyverselubridatereadrxtsRColorBrewerggthemesgridExtraleaflethighcharterscaleslibrary(tidyverse)
library(lubridate)
library(readr)
library(xts)
library("RColorBrewer")
library("ggthemes")
library("gridExtra")
library("leaflet")
library(leaflet.extras)
library("highcharter")
library(scales)
We begin by importing the rat sightings data using the
read_csv function, clean up the variable names with the
clean_names function, and create some more useful date
variables in a mutate pipeline.
rats_raw <- read_csv("./Rat_Sightings.csv", na = c("", "NA", "N/A", "Unspecified")) %>%
janitor::clean_names() %>%
mutate(created_date = mdy_hms(created_date)) %>%
mutate(sighting_year = year(created_date),
sighting_month_num = month(created_date),
sighting_month = month(created_date, label = TRUE, abbr = FALSE),
sighting_day = day(created_date),
sighting_weekday = wday(created_date, label = TRUE, abbr = FALSE))
There are 232,090 records of rat sightings, ranging from 2010 to 2023 and across all 5 boroughs.
Important variables to our analysis include:
created_date: Date of rat sighting recordsighting_year: Year of sightingsighting_month: Month of sightingsighting_day: Sighting day of the monthsighting_weekday: Sighting day of the weeklocation_type: Rat sighting location type (Government
Building, 3+ Family Apt. Building, Construction site, etc.)city: City of sightingborough: Borough of sightinglatitude: Latitude of sightinglongitude: Longitude of sightingWe first explored how rat sightings vary over time (month, day of the week, year) and how rat sightings vary by borough. To do so, we used simple tables, bar charts, line plots, heat maps, and interactive maps.
by_year <- rats_raw %>%
group_by(sighting_year) %>%
count() %>%
ggplot(aes(x = sighting_year, y = n, fill = n)) +
geom_histogram(stat = "identity", position = "dodge") +
theme(legend.position ='none',axis.title = element_text(),axis.text.x = element_text(size = 12)) +
xlab("Year") +
ylab("Count") +
geom_text(aes(label = n), vjust = -0.1, size = 3.75) +
ggtitle('Count of Rat Sightings through the Years') +
scale_fill_gradientn(name = '',colours = rev(brewer.pal(10,'Spectral')))
## Warning in geom_histogram(stat = "identity", position = "dodge"): Ignoring
## unknown parameters: `binwidth`, `bins`, and `pad`
by_year
We see a substantial increase in the number of rat sightings after 2020. This increase is consistent with the city of New York’s rat media coverage and the impact of the COVID-19 pandemic. With more restaurants closed and more restaurants offering outdoor dining, rats are more likely to scavenge outside. A warmer, wetter than usual summer in 2021 also contributed to favorable rat conditions.
by_month <- rats_raw %>%
group_by(sighting_month) %>%
count() %>%
ggplot(aes(x = sighting_month, y = n, fill = n)) +
geom_histogram(stat = "identity", position = "dodge") +
theme(legend.position ='none',axis.title = element_text(),axis.text.x = element_text(size = 9)) +
xlab("Month") +
ylab("Count") +
geom_text(aes(label = n), vjust = -0.1, size = 3.75) +
ggtitle('Count of Rat Sightings by Month') +
scale_fill_gradientn(name = '',colours = rev(brewer.pal(10,'Spectral')))
## Warning in geom_histogram(stat = "identity", position = "dodge"): Ignoring
## unknown parameters: `binwidth`, `bins`, and `pad`
by_month
The most rat sightings are in the summer months with a peak in July. Sightings taper off in the fall, reaching a low in December, and then start to increase in the spring. Warmer weather is more favorable to rat survival and helps their populations grow.
by_day <- rats_raw %>%
group_by(sighting_weekday) %>%
count() %>%
ggplot(aes(x = sighting_weekday, y = n, fill = n)) +
geom_histogram(stat = "identity", position = "dodge") +
theme(legend.position ='none',axis.title = element_text(),axis.text.x = element_text(size = 12)) +
xlab("Weekday") +
ylab("Count") +
geom_text(aes(label = n), vjust = -0.1, size = 4) +
ggtitle('Count of Rat Sightings by Day of Week') +
scale_fill_gradientn(name = '',colours = rev(brewer.pal(10,'Spectral')))
## Warning in geom_histogram(stat = "identity", position = "dodge"): Ignoring
## unknown parameters: `binwidth`, `bins`, and `pad`
by_day
Weekdays have the most rat sightings, peaking on Mondays and staying relatively high throughout the week, while weekends have much lower counts.
for_location_type <- rats_raw %>%
drop_na(location_type) %>%
filter(location_type != "Other (Explain Below)") %>%
group_by(location_type) %>%
mutate(count_loc = n()) %>%
ungroup() %>%
filter(location_type %in% c("3+ Family Apt. Building", "1-2 Family Dwelling", "3+ Family Mixed Use Building", "Commercial Building", "Vacant Lot", "Construction Site"))
ggplot(data = for_location_type, aes(x = fct_infreq(location_type))) +
geom_bar() +
theme_minimal() +
coord_flip() +
labs(title = "Top 6 Location Types for Sightings",
x = "Location Type",
y = "Count")
The above shows the top 6 location types for rat sightings. 3+ Family Apt. Buildings report the highest amount of rat sightings among all location types, while 1-2 Family Dwellings and 3+ Family Mixed Use Buildings report the next two highest amount of sightings. These location types are followed by commercial buildings, vacant lots, and construction sites.
Include?
In order to display rat sightings across New York City, we opted to create interactive maps. The first shows all rat sightings and their geo-location while the second is a heat map.
## Overall Sightings Map and Heat Map
top = 40.917577 # north lat
left = -74.259090 # west long
right = -73.700272 # east long
bottom = 40.477399 # south lat
nyc = rats_raw %>%
filter(latitude >= bottom) %>%
filter ( latitude <= top) %>%
filter( longitude >= left ) %>%
filter(longitude <= right)
center_lon = median(nyc$longitude,na.rm = TRUE)
center_lat = median(nyc$latitude,na.rm = TRUE)
# count = data %>%
# group_by(location) %>%
# count()
#
# count
#
# nyc = merge(nyc, count, by = "location")
factpal = colorFactor("blue", nyc$n)
# nyc %>%
# leaflet() %>%
# addProviderTiles("Esri.NatGeoWorldMap") %>%
# addCircles(lng = ~longitude, lat = ~latitude) %>%
# setView(lng=center_lon, lat=center_lat,zoom = 10)
nyc %>%
leaflet() %>%
addProviderTiles("Esri.NatGeoWorldMap") %>%
addHeatmap(lng = ~longitude, lat = ~latitude, intensity = ~(nyc$n), blur = 20, max = 0.05, radius = 15) %>%
setView(lng=center_lon, lat=center_lat,zoom = 10)
When analyzing our data, we considered many different factors that could be contributors to varying rat sightings in NYC.
We believe that the exponential increase in rat sightings at the end of 2020 and going into 2021 could be a result of the pandemic. Since most restaurants and establishments were closed and less people filled the streets during the lock down, the rats were able to scavenge outside without interruptions. They were most likely able to find food, shelter, and reproduce without people and cars scaring them away. This could have led to an increase in the rat population and more rat sightings. The pandemic also led to an increase in outdoor dining, and these dining areas are still being used today. This could explain the continued high number of rat sightings since there is more food scraps and crumbs outside for the rats to find.
More rat sightings are also observed during the summer months which can be explained by warmer and wetter weather than the rest of the year. Since the weather is warm, the rats are able to scurry around the streets comfortably instead of having to shelter for warmth and safety. Due to temperature, rats normally have their babies in the spring months which also explains an increase in rat sightings during the summer.
We also found that rat sightings are highest during the weekdays (especially towards the beginning) and lowest during the weekend. This can be attributed to people leaving their houses more on the weekdays to go to work, errands, and other obligations, compared to staying in their houses on the weekends.
When analyzing rat sightings by location type, we see that rats are seen most in 3+ family apt. buildings. This may be because these types of apartments are usually large buildings that are maintained with comfortable living conditions such as heat/air conditioning, clean environment, kitchens full of food, etc. That sounds like a perfect place for a rat to settle down. The number of rat sightings are followed by 1-2 family dwellings and 3+ family mixed use buildings which can be explained by the same argument as above. The rats are spotted less often in commercial buildings, vacant lots, and constructions sights. This is most likely because these types of locations are not conducive for a rat to survive. There is most likely little to no food available for them and harsh/uncomfortable living conditions. They especially would not want to nest around constructions sites since it is dangerous and sterile.
Our heat map of rat sightings shows that rats have pretty much taken over the entire city. However, as you zoom in, there are more dense pockets of rat sightings which could be contributed to restaurants or apartment buildings. There does not seem to be one borough that has significantly less rats than the rest.